Week 1 - Inspiring Visualizations¶

This notebook contains some inspiring visualizations that can be quickly created using Python. The goal is to provide you with ideas and examples of how to visualize data effectively. The visualizations are based on real-world datasets and demonstrate various techniques and libraries available in Python for data visualization.

Population Density Map¶

This section is based on the following tutorial: https://medium.com/data-science/creating-beautiful-population-density-maps-with-python-fcdd84035e06 Full credit to the author: Adam Symington (2022) at PythonMaps

In [145]:
# import all relevant libraries
import rasterio
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib.colors import ListedColormap
In [146]:
# Import geographical data from the GHS-POP dataset with rasterio
tif_file = rasterio.open('data/GHS_POP_E2030_GLOBE_R2023A_4326_30ss_V1_0/GHS_POP_E2030_GLOBE_R2023A_4326_30ss_V1_0.tif')
ghs_data = tif_file.read()
In [147]:
# Print some metadata
print("Tiff Boundary", tif_file.bounds)
print("Tiff CRS", tif_file.crs)
print("Data shape", ghs_data.shape)
print("Max value", np.amax(ghs_data))
print("Min value", np.amin(ghs_data))
Tiff Boundary BoundingBox(left=-180.00791593130032, bottom=-89.10041610517152, right=180.00874930942342, top=89.0995831776456)
Tiff CRS EPSG:4326
Data shape (1, 21384, 43202)
Max value 394367.2599414062
Min value 0.0
In [148]:
# Create a custom colormap for the population density
our_cmap = matplotlib.colormaps.get_cmap('hot_r') # Get the continuous 'hot_r' colormap
new_colors = our_cmap(np.linspace(0, 1, 394)) # Sample 394 discrete colors from it
new_cmp = ListedColormap(new_colors) # Create a new discrete colormap
In [149]:
# Create a new figure and axis including resolution (very high resolution, takes very long to render)
fig, ax = plt.subplots(figsize=(20,10), dpi=600)
ax.imshow(ghs_data[0], norm=colors.LogNorm(), cmap=new_cmp) # Create the final plot
ax.axis('off') # Remove axis to make it look cleaner
fig.savefig("pop_density_global.png", dpi=600) # optionally save the figure
plt.show() # Show the figure
No description has been provided for this image

Climate Stripes¶

Recreating the iconic climate stripes visualization. The original visualization was created by Highwoord and Hawkins (2017) at the University of Reading. The idea behind the climate stripes is to represent temperature anomalies over time using a simple layout and visually striking color gradient.

In [6]:
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
In [9]:
# Load data
temperature_data = pd.read_csv("data/GlobalLandTemperaturesByCountry.csv", parse_dates=['dt'])
global_temperature = pd.read_csv("data/GlobalTemperatures.csv", parse_dates=['dt'])
In [20]:
# Let's first look at some basic information about the data
print("Temperature data shape:", temperature_data.shape)
print("Temperature data columns:", temperature_data.columns)
print("Temperature data dataypes \n", temperature_data.dtypes)
print("Unique countries:", temperature_data['Country'].nunique())
print("Temperature data first rows:\n", temperature_data.head())
Temperature data shape: (577462, 4)
Temperature data columns: Index(['dt', 'AverageTemperature', 'AverageTemperatureUncertainty', 'Country'], dtype='object')
Temperature data dataypes 
 dt                               datetime64[ns]
AverageTemperature                      float64
AverageTemperatureUncertainty           float64
Country                                  object
dtype: object
Unique countries: 243
Temperature data first rows:
           dt  AverageTemperature  AverageTemperatureUncertainty Country
0 1743-11-01               4.384                          2.294   Åland
1 1743-12-01                 NaN                            NaN   Åland
2 1744-01-01                 NaN                            NaN   Åland
3 1744-02-01                 NaN                            NaN   Åland
4 1744-03-01                 NaN                            NaN   Åland
In [21]:
temperature_data
Out[21]:
dt AverageTemperature AverageTemperatureUncertainty Country
0 1743-11-01 4.384 2.294 Åland
1 1743-12-01 NaN NaN Åland
2 1744-01-01 NaN NaN Åland
3 1744-02-01 NaN NaN Åland
4 1744-03-01 NaN NaN Åland
... ... ... ... ...
577457 2013-05-01 19.059 1.022 Zimbabwe
577458 2013-06-01 17.613 0.473 Zimbabwe
577459 2013-07-01 17.000 0.453 Zimbabwe
577460 2013-08-01 19.759 0.717 Zimbabwe
577461 2013-09-01 NaN NaN Zimbabwe

577462 rows × 4 columns

In [24]:
# Filter and preprocess country-level temperature data
excluded_countries = [
    "Africa", "Asia", "Baker Island", "Europe", "North America", "Saint Martin",
    "Palmyra Atoll", "Virgin Islands", "South America", "Oceania", "Kingman Reef"
]

temperature_data = temperature_data[
    (temperature_data['dt'] >= pd.to_datetime("1750-01-01")) &
    (~temperature_data['Country'].isin(excluded_countries))
].copy()

temperature_data['year'] = temperature_data['dt'].dt.year
temperature_data['month'] = temperature_data['dt'].dt.month
temperature_data.dropna(inplace=True)
In [29]:
# Filter for Italy
italy = temperature_data[temperature_data['Country'] == 'Italy']

# Preprocess global data
global_temperature['year'] = global_temperature['dt'].dt.year
global_temperature['month'] = global_temperature['dt'].dt.month
global_temperature = global_temperature[['dt', 'year', 'month', 'LandAverageTemperature']].dropna()

# Calculate annual means
annual_means = global_temperature.groupby('year')['LandAverageTemperature'].mean().reset_index()
italy_annual_means = italy.groupby('year')['AverageTemperature'].mean().reset_index()

# Align years
end_year = min(italy_annual_means['year'].max(), annual_means['year'].max())

# Calculate baseline mean (1971–2000)
baseline_mean = annual_means[
    (annual_means['year'] >= 1971) & (annual_means['year'] <= 2000)
]['LandAverageTemperature'].mean()

annual_anomalies = annual_means[
    (annual_means['year'] >= 1870) & (annual_means['year'] <= end_year)
].copy()
annual_anomalies['TemperatureAnomaly'] = annual_anomalies['LandAverageTemperature'] - baseline_mean
annual_anomalies
Out[29]:
year LandAverageTemperature TemperatureAnomaly
120 1870 8.201333 -0.723700
121 1871 8.115083 -0.809950
122 1872 8.193833 -0.731200
123 1873 8.351083 -0.573950
124 1874 8.433500 -0.491533
... ... ... ...
259 2009 9.505250 0.580217
260 2010 9.703083 0.778050
261 2011 9.516000 0.590967
262 2012 9.507333 0.582300
263 2013 9.606500 0.681467

144 rows × 3 columns

In [30]:
# Do the same for Italy again
italy_baseline_mean = italy_annual_means[
    (italy_annual_means['year'] >= 1971) & (italy_annual_means['year'] <= 2000)
]['AverageTemperature'].mean()

italy_annual_anomalies = italy_annual_means[
    (italy_annual_means['year'] >= 1870) & (italy_annual_means['year'] <= end_year)
].copy()
italy_annual_anomalies['TemperatureAnomaly'] = italy_annual_anomalies['AverageTemperature'] - italy_baseline_mean
In [137]:
# Define the function to create a stripes plot
def make_stripes(df, label, color_range):
    # Create a new figure using plotly graph_objects
    fig = go.Figure()

    # Create a bar trace for the temperature anomalies
    fig.add_trace(go.Bar(
        x=df['year'], # x-axis values (years)
        y=[1] * len(df), # y-axis values (constant height for stripes), 1 for each year
        marker=dict( # use a custom color scale for the bars, colored by temperature anomaly using the marker attribute
            color=df['TemperatureAnomaly'],
            cmin=color_range[0],
            cmax=color_range[1],
            colorscale=[[0, '#02509D'], [0.5, 'white'], [1, '#CC1117']],  # custom blue to red
            line=dict(width=0)
        ),
        hovertemplate=label + " %{x}: %{marker.color:.2f}°C<extra></extra>",
        showlegend=False
    ))
    return fig
In [157]:
# Get consistent color range
cmin = min(annual_anomalies['TemperatureAnomaly'].min(), italy_annual_anomalies['TemperatureAnomaly'].min())
cmax = max(annual_anomalies['TemperatureAnomaly'].max(), italy_annual_anomalies['TemperatureAnomaly'].max())
color_range = [cmin, cmax]
print(color_range)
[np.float64(-1.2804361111111078), np.float64(1.1410638888888922)]
In [139]:
# Create the stripes plot for both datasets
fig_world = make_stripes(annual_anomalies, "World", color_range)
fig_italy = make_stripes(italy_annual_anomalies, "Italy", color_range)
In [162]:
# Combine plots
fig = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    vertical_spacing=0.02,
    horizontal_spacing=0.0
)

# Add bar traces to each row
for trace in fig_world.data:
    fig.add_trace(trace, row=1, col=1)
for trace in fig_italy.data:
    fig.add_trace(trace, row=2, col=1)

# Assign y-axis titles and styling
fig.update_yaxes(
    title_text="World", row=1, col=1,
    title_standoff=0,
    title_font=dict(size=28),
    showticklabels=False,
    showgrid=False
)
fig.update_yaxes(
    title_text="Italy", row=2, col=1,
    title_standoff=0,
    title_font=dict(size=28),
    showticklabels=False,
    showgrid=False
)
# Update x-axis styling
fig.update_xaxes(tickfont=dict(size=24))

# Update overall figure layout
fig.update_layout(
    height=500,
    width=1200,
    title_text="Climate Stripes: Temperature Anomalies over Time for the World and Italy",
    title_font=dict(size=29),
    showlegend=False,
    hoverlabel=dict(bgcolor='white'),
    margin=dict(l=40, r=20, t=80, b=20),
    bargap=0,
    xaxis=dict(showgrid=False, tickmode='linear', tick0=1870, dtick=20)
)

fig.show()